64 research outputs found
Wide-coverage deep statistical parsing using automatic dependency structure annotation
A number of researchers (Lin 1995; Carroll, Briscoe, and Sanfilippo 1998; Carroll et al. 2002; Clark and Hockenmaier 2002; King et al. 2003; Preiss 2003; Kaplan et al. 2004;Miyao and Tsujii 2004) have convincingly argued for the use of dependency (rather than CFG-tree) representations
for parser evaluation. Preiss (2003) and Kaplan et al. (2004) conducted a number of experiments comparing ādeepā hand-crafted wide-coverage with āshallowā treebank- and machine-learning based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit the experiments
in Preiss (2003) and Kaplan et al. (2004), this time using the sophisticated automatic LFG f-structure annotation methodologies of Cahill et al. (2002b, 2004) and Burke (2006), with surprising results. We compare various PCFG and history-based parsers (based on Collins, 1999; Charniak, 2000; Bikel, 2002) to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers (Carroll and Briscoe 2002; Riezler et al. 2002). We evaluate using dependency-based gold standards (DCU 105, PARC 700, CBS 500 and dependencies for WSJ Section 22) and use the Approximate Randomization Test (Noreen 1989) to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, widecoverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank (King et al. 2003), a statistically significant improvement of 2.18%over the most recent results of 80.55%for the hand-crafted LFG grammar and XLE parsing system of Riezler et al. (2002), and an f-score of 80.23% against the CBS 500 Dependency Bank (Carroll, Briscoe, and Sanfilippo 1998), a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system of Carroll and
Briscoe (2002)
Relationships between Specific Airway Resistance and Forced Expiratory Flows in Asthmatic Children
. The first aim was to assess the relationships between forced expiratory flows and sRaw in a large group of asthmatic children in a transversal study. We then performed a longitudinal study in order to determine whether sRaw of preschool children could predict subsequent impairment of forced expiratory flows at school age.Pulmonary function tests (sRaw and forced expiratory flows) of 2193 asthmatic children were selected for a transversal analysis, while 365 children were retrospectively selected for longitudinal assessment from preschool to school age. (% predicted) (ā0.09, 95% CI, ā0.20 to 0). and could be used in preschool children to predict subsequent mild airflow limitation
Targeted Amplicon Sequencing (TAS): A Scalable Next-Gen Approach to Multilocus, Multitaxa Phylogenetics
Next-gen sequencing technologies have revolutionized data collection in genetic studies and advanced genome biology to novel frontiers. However, to date, next-gen technologies have been used principally for whole genome sequencing and transcriptome sequencing. Yet many questions in population genetics and systematics rely on sequencing specific genes of known function or diversity levels. Here, we describe a targeted amplicon sequencing (TAS) approach capitalizing on next-gen capacity to sequence large numbers of targeted gene regions from a large number of samples. Our TAS approach is easily scalable, simple in execution, neither time-nor labor-intensive, relatively inexpensive, and can be applied to a broad diversity of organisms and/or genes. Our TAS approach includes a bioinformatic application, BarcodeCrucher, to take raw next-gen sequence reads and perform quality control checks and convert the data into FASTA format organized by gene and sample, ready for phylogenetic analyses. We demonstrate our approach by sequencing targeted genes of known phylogenetic utility to estimate a phylogeny for the Pancrustacea. We generated data from 44 taxa using 68 different 10-bp multiplexing identifiers. The overall quality of data produced was robust and was informative for phylogeny estimation. The potential for this method to produce copious amounts of data from a single 454 plate (e.g., 325 taxa for 24 loci) significantly reduces sequencing expenses incurred from traditional Sanger sequencing. We further discuss the advantages and disadvantages of this method, while offering suggestions to enhance the approach
Dorsoventral Patterning in Hemichordates: Insights into Early Chordate Evolution
We have compared the dorsoventral development of hemichordates and chordates to deduce the organization of their common ancestor, and hence to identify the evolutionary modifications of the chordate body axis after the lineages split. In the hemichordate embryo, genes encoding bone morphogenetic proteins (Bmp) 2/4 and 5/8, as well as several genes for modulators of Bmp activity, are expressed in a thin stripe of ectoderm on one midline, historically called ādorsal.ā On the opposite midline, the genes encoding Chordin and Anti-dorsalizing morphogenetic protein (Admp) are expressed. Thus, we find a Bmp-Chordin developmental axis preceding and underlying the anatomical dorsoventral axis of hemichordates, adding to the evidence from Drosophila and chordates that this axis may be at least as ancient as the first bilateral animals. Numerous genes encoding transcription factors and signaling ligands are expressed in the three germ layers of hemichordate embryos in distinct dorsoventral domains, such as pox neuro, pituitary homeobox, distalless, and tbx2/3 on the Bmp side and netrin, mnx, mox, and single-minded on the Chordin-Admp side. When we expose the embryo to excess Bmp protein, or when we deplete endogenous Bmp by small interfering RNA injections, these expression domains expand or contract, reflecting their activation or repression by Bmp, and the embryos develop as dorsalized or ventralized limit forms. Dorsoventral patterning is independent of anterior/posterior patterning, as in Drosophila but not chordates. Unlike both chordates and Drosophila, neural gene expression in hemichordates is not repressed by high Bmp levels, consistent with their development of a diffuse rather than centralized nervous system. We suggest that the common ancestor of hemichordates and chordates did not use its Bmp-Chordin axis to segregate epidermal and neural ectoderm but to pattern many other dorsoventral aspects of the germ layers, including neural cell fates within a diffuse nervous system. Accordingly, centralization was added in the chordate line by neural-epidermal segregation, mediated by the pre-existing Bmp-Chordin axis. Finally, since hemichordates develop the mouth on the non-Bmp side, like arthropods but opposite to chordates, the mouth and Bmp-Chordin axis may have rearranged in the chordate line, one relative to the other
Lifetime measurements of excited states in Ā¹ā¶Ā³W and the implications for the anomalous B(E2) ratios in transitional nuclei
This letter reports lifetime measurements of excited states in the odd-N nucleus 163W using the recoil-distance Doppler shift method to probe the core polarising effect of the i13/2 neutron orbital on the underlying soft triaxial even-even core. The ratio B(E2:21/2āŗ ā 17/2āŗ)/B(E2:17/2āŗ ā 13/2āŗ) is consistent with the predictions of the collective rotational model. The deduced B(E2) values provide insights into the validity of collective model predictions for heavy transitional nuclei and a geometric origin for the anomalous B(E2) ratios observed in nearby even-even nuclei is proposed
Recommended from our members
Unsupervised Entailment Detection between Dependency Graph Fragments
Entailment detection systems are generally
designed to work either on single words, relations
or full sentences. We propose a new
task ā detecting entailment between dependency
graph fragments of any type ā which
relaxes these restrictions and leads to much
wider entailment discovery. An unsupervised
framework is described that uses intrinsic similarity,
multi-level extrinsic similarity and the
detection of negation and hedged language to
assign a confidence score to entailment relations
between two fragments. The final system
achieves 84.1% average precision on a data set
of entailment examples from the biomedical
domain
Recommended from our members
Artificial Error Generation with Machine Translation and Syntactic Patterns.
Shortage of available training data is holding back progress in the area of automated error detection. This paper investigates two alternative methods for artificially generating writing errors, in order to create additional resources. We propose treating error generation as a machine translation task, where grammatically correct text is translated to contain errors. In addition, we explore a system for extracting textual patterns from an annotated corpus, which can then be used to insert errors into grammatically correct sentences. Our experiments show that the inclusion of artificially generated errors significantly improves error detection accuracy on both FCE and CoNLL 2014 datasets
Recommended from our members
An Error-Oriented Approach to Word Embedding Pre-Training
We propose a novel word embedding pre-training approach that exploits writing errors in learners' scripts. We compare our method to previous models that tune the embeddings based on script scores and the discrimination between correct and corrupt word contexts in addition to the generic commonly-used embeddings pre-trained on large corpora. The comparison is achieved by using the aforementioned models to bootstrap a neural network that learns to predict a holistic score for scripts. Furthermore, we investigate augmenting our model with error corrections and monitor the impact on performance. Our results show that our error-oriented approach outperforms other comparable ones which is further demonstrated when training on more data. Additionally, extending the model with corrections provides further performance gains when data sparsity is an issue
Recommended from our members
Curriculum Q-Learning for Visual Vocabulary Acquisition
The structure of curriculum plays a vital role in our learning process, both
as children and adults. Presenting material in ascending order of difficulty
that also exploits prior knowledge can have a significant impact on the rate of
learning. However, the notion of difficulty and prior knowledge differs from
person to person. Motivated by the need for a personalised curriculum, we
present a novel method of curriculum learning for vocabulary words in the form
of visual prompts. We employ a reinforcement learning model grounded in
pedagogical theories that emulates the actions of a tutor. We simulate three
students with different levels of vocabulary knowledge in order to evaluate the
how well our model adapts to the environment. The results of the simulation
reveal that through interaction, the model is able to identify the areas of
weakness, as well as push students to the edge of their ZPD. We hypothesise
that these methods can also be effective in training agents to learn language
representations in a simulated environment where it has previously been shown
that order of words and prior knowledge play an important role in the efficacy
of language learning
- ā¦